Transport Usage

In order to facilitate running on remote machines, remotemanager uses a file sending system internally referred to as Transport. For most use cases, you will not need to interface with these structures, however you may find their functions helpful for controlling files.

Transport types

Transport itself is not a useful structure, and you won’t get very far by using it in its raw form. It exists to give a common set of methods to all subclasses. The primary subclass is transport.rsync

[1]:
# dev note: be careful editing this tutorial,
# it's _very_ sensitive to files and folders already existing,
# they must be cleared prior to any run, else it will cause the CI to fail

from remotemanager.transport import rsync

Transport functions with a queue system, and holds a concept of push and pull. First, initialise your transport class with the arguments that you would like rsync to use

[2]:
tr = rsync(flags='auv')

To actually transfer files, you must first queue them. Transport entities consider your current machine as the local or origin point, and the destination as the remote or target point. First, lets create some folders and files for demonstration:

[3]:
from remotemanager import URL

url = URL()

url.cmd('rm -r temp_trn_local', raise_errors=False)
url.cmd('rm -r temp_trn_remote', raise_errors=False)
url.utils.mkdir('temp_trn_local')
url.utils.mkdir('temp_trn_remote')

url.utils.touch('temp_trn_local/send_me')
url.utils.touch('temp_trn_local/send_me_also')

url.utils.touch('temp_trn_remote/fetch_me')
url.utils.touch('temp_trn_remote/fetch_me_too')
url.utils.touch('temp_trn_remote/fetch_me_differently')
[3]:

[4]:
print(url.utils.ls('temp_trn_local'))
['send_me', 'send_me_also']
[5]:
print(url.utils.ls('temp_trn_remote'))
['fetch_me', 'fetch_me_differently', 'fetch_me_too']

Now we have 2 files on our “local” machine we want to send, and also 3 files on our “remote” machine that we need to fetch. Lets start with pushing. To do this, we need to use the method queue_for_push

This takes the format of files, local, remote:

[6]:
tr.queue_for_push(['send_me', 'send_me_also'], 'temp_trn_local', 'temp_trn_remote')

With this done, we can see the transferrs that are ready to occur. Either by accessing the transfers property, or using the print_transfers method, which formats it for you

[7]:
tr.transfers
[7]:
{'/home/test/remotemanager/docs/source/tutorials/temp_trn_local/>temp_trn_remote/': ['send_me',
  'send_me_also']}
[8]:
tr.print_transfers()
transfer 1:
origin: /home/test/remotemanager/docs/source/tutorials/temp_trn_local/
target: temp_trn_remote/
        (1/2) send_me
        (2/2) send_me_also

Here we can see a single transfer that is ready to occur, which represents one rsync call. Before executing, we can see the commands to be executed by calling the transfer method with dry_run=True

[9]:
tr.transfer(dry_run=True)
[9]:
[rsync -auv --checksum /home/test/remotemanager/docs/source/tutorials/temp_trn_local/{send_me,send_me_also} temp_trn_remote/]

This looks good, lets go:

[10]:
tr.transfer()
Transferring 2 Files... Done

Now check the “remote” folder to see what it looks like:

[11]:
url.utils.ls('temp_trn_remote')
[11]:
['fetch_me', 'fetch_me_differently', 'fetch_me_too', 'send_me', 'send_me_also']

Seems that the files have been sent as expected

More complex movement

You may be aware that rsync cannot handle a many-to-many situation. This is the greatest strength of the Transport systems. The queuing necessity means that prior to a command execution, logic can be applied and the minimum amount of calls can be made.

In the following example we have 3 files to fetch from the “remote”. Lets assume that we want one to go to a different folder, Transport handles this for you:

[12]:
tr.queue_for_pull(['fetch_me', 'fetch_me_too'], 'temp_trn_local', 'temp_trn_remote')

url.utils.mkdir('temp_trn_local_different')  # create a different target dir for this file
tr.queue_for_pull('fetch_me_differently', 'temp_trn_local_different', 'temp_trn_remote')

Note

Pay close attention to the folder ordering. While we are pulling from the remote, Transport itself is still a connection from the “local” to the “remote”. Hence, the folder order does not change.

Lets look at our transfers:

[13]:
tr.print_transfers()
transfer 1:
origin: temp_trn_remote/
target: /home/test/remotemanager/docs/source/tutorials/temp_trn_local/
        (1/2) fetch_me
        (2/2) fetch_me_too
transfer 2:
origin: temp_trn_remote/
target: /home/test/remotemanager/docs/source/tutorials/temp_trn_local_different/
        (1/1) fetch_me_differently

and commands

[14]:
tr.transfer(dry_run=True)
[14]:
[rsync -auv --checksum temp_trn_remote/{fetch_me,fetch_me_too} /home/test/remotemanager/docs/source/tutorials/temp_trn_local/,
 rsync -auv --checksum temp_trn_remote/fetch_me_differently /home/test/remotemanager/docs/source/tutorials/temp_trn_local_different/]

Now execute, and look into the folders

[15]:
tr.transfer()
Transferring 3 Files in 2 Transfers... Done
[16]:
print(url.utils.ls('temp_trn_local'))
['fetch_me', 'fetch_me_too', 'send_me', 'send_me_also']
[17]:
print(url.utils.ls('temp_trn_local_different'))
['fetch_me_differently']

Looks like all our files have been brought back to the correct place!

Bash Compatibility Mode

All Transports can be provided with an argument dir_mode, either at init (when calling rsync(..., dir_mode=True), or on the transfer(dir_mode=True). In most cases, you will not have access to the actual transfer call, so it is best to set it at init, or update it via the dir_mode property if needed.

If True, any transfer will have an extra step added where the target files are first copied to a temporary directory, then transferred via *. This avoids using bash brace expansion to generate the command, who’s behaviour can change on some machines.

Progress

If you have used rsync before, you may be aware that there is a --progress option. This prints a continuous update stream as the files are transferred.

When creating an rsync object, you can enable this for your terminal by setting progress=True on the initial call.

We can demonstrate this here using a Dataset

[18]:
from remotemanager import Dataset
from remotemanager.transport import rsync

def f(i):
    return i

ds = Dataset(f, transport=rsync(progress=True), skip=False)

ds.append_run({"i": 1})
ds.append_run({"i": 2})
appended run runner-0
appended run runner-1
[19]:
ds.run()
Staging Dataset... Staged 2/2 Runners
Transferring for 2/2 Runners
Transferring 7 Files
sending incremental file list
dataset-a6e26708-master.sh

            377 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=6/7)
dataset-a6e26708-repo.py

          5.60K 100%    5.34MB/s    0:00:00 (xfr#2, to-chk=5/7)
dataset-a6e26708-repo.sh

            444 100%  433.59kB/s    0:00:00 (xfr#3, to-chk=4/7)
dataset-a6e26708-runner-0-jobscript.sh

            137 100%  133.79kB/s    0:00:00 (xfr#4, to-chk=3/7)
dataset-a6e26708-runner-0-run.py

            987 100%  963.87kB/s    0:00:00 (xfr#5, to-chk=2/7)
dataset-a6e26708-runner-1-jobscript.sh

            137 100%  133.79kB/s    0:00:00 (xfr#6, to-chk=1/7)
dataset-a6e26708-runner-1-run.py

            987 100%  963.87kB/s    0:00:00 (xfr#7, to-chk=0/7)

sent 9.29K bytes  received 149 bytes  18.89K bytes/sec
total size is 8.67K  speedup is 0.92
Done
Remotely executing 2/2 Runners
[19]:
True

You can, of course, override this behaviour with verbose=False

[20]:
ds.run(verbose=False, force=True)
[20]:
True

Advanced Usage

Contrary to the note regarding the folder order, there exists one further method which inverts the behaviour of the folder ordering. In fact both queueing methods internally call this method, acting as formatters for its arguments.

This method is not intended to be called by the user, but is left as a non-private function for those who prefer its behaviour.

Instead of passing files, local, remote, you must pass files, origin, target, mode. This takes a file-centric view, and thus for a pull, the origin is the remote dir. The mode simply tells Transport where to put the structures for connecting to the remote, and can either be “push” or “pull”:

[21]:
tr.add_transfer('fetch_me', 'temp_trn_remote', 'temp_trn_local', 'pull')
[22]:
tr.transfer(dry_run=True)
[22]:
[rsync -auv --checksum temp_trn_remote/fetch_me /home/test/remotemanager/docs/source/tutorials/temp_trn_local/]

As you can see, the transfer is created in the intended way, despite the “swapped” folders. You may deem this to be a more sensible use case, and prefer to use it. As the queue functions exist soley to call this function, this should remain a safe method of use for those that wish to use it.

Naming Conventions

For reference, the below table sums up the naming convention within the source, for those who want to do further reading:

name

meaning

local

“local” folder, regardless of mode of use

remote

“remote” folder, regardless of mode of use

origin

starting folder for the files; the first folder in an rsync command

target

destination folder for the files; the second folder in an rsync command

Note

Be aware of the argument expansion limitation that exists with rsync versions below version 3. If you get errors during transfer, be sure to check rsync --version >= 3.